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Abstract 

Agent  based  intrusion  detection  systems  (IDS)  have  advantages  such  as  scalability, 
reconfigurability,  and  Survivability.  In  this  paper,  we  introduce  a  mobile-agent  based 
IDS,  called  ABIDE  (Agent  Based  Intrusion  Detection  Environment).  ABIDE  is  com¬ 
prised  of  various  types  of  agents,  all  of  which  are  mobile,  lightweight,  and  specialized. 

The  most  common  form  of  agent  is  the  DMA  (Data  Mining  Agent),  which  randomly 
moves  around  the  network  and  collects  information.  The  DMA  then  relays  the  infor¬ 
mation  it  has  gathered  to  a  DFA  (Data  Fusion  Agent)  which  assesses  the  likelihood  of 
intrusion.  As  we  show  in  this  paper,  there  is  a  quantifiable  relationship  between  the 
number  of  DMA  and  the  probability  of  detecting  an  intrusion.  We  study  this  relation¬ 
ship  and  its  implications. 
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1  Introduction 


An  intrusion  to  a  computer  system  may  be  indicated  by  abnormal  network  traffic,  anomalous 
user  activity,  or  application  misbehavior.  Intrusion  detection  systems  (IDS)1  which  focus  on 
detecting  abnormal  network  activity  are  called  network-based  IDS,  whereas  intrusion  detec¬ 
tion  systems  that  focus  on  detecting  abnormal  host  activity  are  called  host-based  IDS.  In 
addition,  some  “hybrid”  IDS  have  sensors  which  collect  both  host  and  network  data. 

Traditional  IDS  which  use  a  monolithic  architecture  (i.e.,  a  centralized  architecture  of 
data  collection  and  analysis)  have  a  variety  of  problems.  These  problems  include  introducing 
a  single  point  of  failure  (which  is  bad  for  survivability),  lack  of  scalability,  and  in  addition 
traditional  IDS  may  be  difficult  to  reconfigure.  To  overcome  these  shortcomings,  agent  based 
IDS  which  are  distributed,  scalable,  and  re-configurable  have  become  popular  [1],[2] .  To  take 
advantage  of  this  agent  based  IDS  idea,  the  US  Naval  Research  Laboratory  is  designing  a 
host-based  intrusion  detection  system  called  ABIDE  (agent  based  intrusion  detection  envi¬ 
ronment)  ,2.  that  uses  mobile  agent  technology.  ABIDE  differs  from  other  agent-based  IDS, 
which  usually  introduce  some  level  of  coordinated  communications  among  IDS  components, 
in  the  following  way: 

To  avoid  a  targeted  attack  to  disable  the  IDS,  all  agents  randomly  move 
around  monitoring  hosts.  There  is  no  fixed  infrastructure,  except  that  each  host 
needs  to  be  monitored,  and  has  an  agent-platform  that  can  host  agents  when  they 
decide  to  move  in.  There  is  neither  a  central  site  for  analysis,  nor  a  scheduler  for 
agents  in  ABIDE.  Also,  to  make  the  agent  lightweight  (i.e.,  using  a  small  amount 
of  code,  which  reduces  network  overhead  associated  with  agent  movement),  tasks 
are  split  among  different  kinds  of  agents  that  perform  different  functions. 

In  ABIDE,  there  are  four  different  kinds  of  agents.  These  agents  have  an  implied  hierarchy 
for  the  purpose  of  data  and  command  flow. 

1.  A  data  mining  agent  (DMA)  roams  around  (i.e.,  randomly  chooses  hosts  and  moves 
to  the  hosts)  and  acquires  environmental  information.  It  is  small,  lightweight,  and 
specialized.  For  example,  a  DMA  may  be  tasked  to  verify  a  checksum  on  an  import 
system  binary  such  as  the  Unix  PS  binary.  If  the  agent  finds  suspicious  data,  it  will 
acquire  it  for  further  analysis. 

2.  A  data  fusion  agent  (DFA)  roams  around  and  randomly  interacts  with  the  various 
DMA.  It  receives  the  DMA  data  payload  and  builds  a  larger  picture  of  events  from 
this  data.  As  the  DFA  collects  data,  it  can  apply  classic  IDS  techniques  to  determine 
whether  an  intrusion  is  taking  place.  Of  course,  when  the  DMA  and  DFA  meet  up  is 
a  function  of  time  and  the  size  of  the  network. 

1  Abbreviations  can  be  taken  as  either  singular  or  plural  depending  upon  the  context. 

2The  ABIDE  idea  grew  out  of  the  work  of  Michael  Reed  while  he  was  at  NRL  [3,  4] 
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3.  A  probe  agent  (PA)  is  dispatched  by  a  DFA  to  perform  a  test  to  confirm  intrusion. 

4.  Once  the  DFA  has  decided  that  a  system  has  been  compromised,  a  corrective  agent 
(CA)  can  be  dispatched  to  take  actions.  The  CA  is  the  only  agent  empowered  to  change 
system  state  on  the  host  systems. 

In  this  paper,  we  focus  on  the  first  two  types  of  agents  about  which  ABIDE  is  concerned. 
We  study  the  probabilistic  behavior  of  the  DMA  reporting  to  a  DFA.  Specifically,  we  are 
concerned  with  two  questions: 

•  Q1  —  Given  a  network  of  a  fixed  number  of  hosts  and  a  fixed  number  of  DMA,  what 
is  the  probability  of  detecting  an  intrusion? 

•  Q2  —  Given  a  network  of  a  fixed  number  of  hosts,  if  we  want  to  detect  an  intrusion 
with  a  certain  confidence,  how  many  DMA  have  to  be  deployed? 

2  Special  Case 

In  this  section,  we  consider  the  situation  of  K  DMA  randomly  visiting  nodes  of  a  network 
to  discover  various  pieces  of  information  and  report  this  information  back  to  one  DFA.  Each 
individual  piece  of  information  that  a  DMA  obtains  may  not  be  in  itself,  enough  to  alert  the 
DFA  to  an  intrusion;  however  an  aggregate  of  the  individual  pieces  of  information  collected 
by  the  DMA  may  alert  the  DFA  to  an  intrusion.  It  is  this  threshold  criteria  with  which 
we  are  concerned.  Once  this  threshold  is  reached,  the  DFA  deploys  a  PA.  Our  analysis 
stops  at  the  decision  to  deploy  a  PA.  We  refer  to  each  host  which  a  DMA  visits  as  a  node 
Hi,i  =  1, . . . ,  M.  We  assume  that,  as  each  DMA  randomly  travels  from  node  to  node,  it  picks 
up  a  unique  atom  of  information  a,;  at  each  node  /q.  In  our  special  case,  a  DMA  never  visits 
the  same  node  twice.  (In  reality,  a  DMA  may  visit  the  same  node  more  than  once,  due  to  the 
randomness  of  its  travels,  but  it  must  visit  a  given  fixed  number  of  unique  nodes  during  its 
sojourn.  We  examine  the  simple  case,  which  is  equivalent.)  For  simplicity,  we  assume  that, 
at  a  specific  time,  the  DMA  transfers  its  atoms  to  the  DFA.  (In  reality  both  the  DMA  and 
the  DFA  randomly  travel  the  network.  When  a  DMA  meets  up  with  a  DFA,  it  then  transfers 
its  atoms  to  the  DFA.)  For  simplicity,  we  assume  that  there  is  only  one  DFA.  If  the  DFA  has 
sufficient  atoms,  it  declares  that  the  intrusion  threshold  0  has  been  reached  and  therefore  it 
deploys  a  PA. 

This  is  similar  to  the  threshold  schemes  discussed  in  [5] ,  in  that  below  the  threshold  level 
of  0,  one  can  assume  no  knowledge,  but  at  or  above  0,  the  game  is  up.  In  this  paper  we  do 
not  discuss  how  0  is  determined,  nor  do  we  discuss  the  case  where,  below  0,  the  DFA  has  no 
knowledge  of  an  intrusion.  In  addition,  we  have  made  further  simplifying  assumptions  and 
will  discuss  the  general  situation  in  future  work.  What  is  salient  about  our  work  in  this  paper 
is  that  even  with  the  assumptions  made  for  simplification,  the  mathematics  are  quite  difficult 
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to  derive  and  computationally  quite  expensive  to  perform.  We  are  presently  investigating 
approximations  to  the  formulas  presented  in  this  paper  to  speed  up  the  computation  and  to 
develop  “rules  of  thumb.” 

2.1  Formalism 

We  will  now  formally  present  our  problem. 

•  The  network  is  made  up  of  M  nodes  /q  .  /  1 . M. 

•  There  are  K  DMA  A/,,  k  =  1, . . . ,  K . 

•  Each  Ak  visits  n  and  only  n  nodes,  and  each  node  is  distinct.  A obtains  a  unique 
atom  from  each  node.  Every  DMA  that  visits  the  same  node  /q  receives  the  same  atom 
a,/- 

•  After  Ak  has  visited  n  nodes,  it  gives  the  n  (unique)  atoms  i  =  1, . . ,  ,n  to  (the 
single)  DFA. 

Note  that  even  though  A has  n  unique  atoms,  Ay  might  have  some  of  the  same  atoms 
as  .  Therefore,  when  all  of  the  A^  have  reported  to  the  DFA,  we  can  then  view  the  DFA 
as  a  bag  of  atoms,  i.e.,  an  atom  might  be  in  DFA  more  than  once.  We  are  only  interested  in 
the  unique  number  of  atoms  in  the  DFA.  Note  that  since  visiting  the  node  /q  is  equivalent 
to  obtaining  the  atom  cq,  so  we  will  sometimes  blur  the  distinction. 

Let  Pk(A1,  u  :  T )  be  the  probability  that  the  DFA  contains  exactly  T  unique  atoms,  given 
that  K  agents  have  searched  through  M  nodes,  picking  n  (distinct)  nodes  per  agent.  Let 
us  consider  a  simple  example  first.  Keep  in  mind  the  actual  probabilistic  term  of  interest, 
when  a  threshold  0  is  given,  is  the  more  complicated  Y1t>o  Pk(M,u  :  T).  This  allows  us 
to  answer  Q1  in  this  special  case. 

Example  1:  Say  that  we  have  a  network  of  5  nodes,  2  agents,  and  each  agent  visits  one 
node.  The  only  non-trivial  choices  for  T  are  1  or  2,  since  we  can  never  have  2  agents,  each 
visiting  one  node,  together  visit  more  than  2  distinct  nodes.  Each  run  of  the  experiment 
results  in  an  ordered  pair  of  nodes  =  1, . . . ,  5,  j  =  1, . . . ,  5.  There  are  25  equally 

likely  ways  to  pick  these  pairs.  We  easily  see  that  there  are  5  pairs  of  the  form  (Aq,  Aq),  there 
are  20  “distinct”  pairs.  Thus  P2( 5, 1  :  1)  =  5/25  =  .2  and  P2( 5, 1:2)  =  20/25  =  .8 

Example  2a:  What  happens  now  if  we  have  5  nodes,  2  agents,  T  =  2,  but  each  agent 
visits  2  distinct  nodes  (hence  2  distinct  atoms  per  agent).  Thus,  we  wish  to  determine  the 
probability  P2( 5,2  :  2).  Consider  the  first  agent  A\  (note  that  we  have  arbitrarily  called  one 
agent  the  “first”).  The  2  nodes  visited  by  A\  are  represented  as  the  unordered  pair  (ij)  (thus 
(ij)  =  (ji)  ).  Consider  the  5x5  matrix  atj,  i  =  1, . . . ,  5,  j  =  1, . . . ,  5.  The  visits  of  A\  are 
represented  by  the  upper  triangular  matrix  of  ci  j  j.  These  are  the  10  unordered  pairs  (12), 
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(13),  (14),  (15),  (23),  (24),  (25),  (34),  (35),  (45).  The  only  way  to  achieve  T  =  2  is  for  A2  to 
visit  exactly  the  same  nodes  as  A\ .  Since  there  are  10  ways  for  A\  and  A2  to  agree  out  of  a 
total  of  100  different  possible  visits  for  A\  and  A2  (10  for  each  DMA  when  unconstrained), 
we  see  that  P2( 5, 2:2)  =  10/100  =  .1  . 

Example  2b:  Now  we  are  in  the  same  situation  as  Ex.  2,  except  that  we  have  T  =  4. 
To  achieve  this  the  visits  from  A1  and  A2  must  have  a  null  intersection.  Given  any  visit 
of  A\  there  are  always  exactly  3  ways  for  the  A2  visit  to  have  a  null  intersection  with  the 
given  A\  pick.  Since  there  are  10  possible  A\  picks,  there  are  30  ways  to  achieve  T  =  4, 
thus  P2(5,2  :  4)  =  30/100  =  .3  .  Note  that  since  P2(5,2  :  1)  =  0,  and  we  know  that 
Po(5,2  :  2)  =  .1  and  P2(5,2  :  4)  =  .3,  and  P2(5,2  :  T)  =  0,  for  T  >  4,  we  have  that 
P2( 5,2  :  3)  =  .6. 

We  see  that  calculating  the  probabilistic  terms  Pk(M,ti  :  T )  quickly  becomes  quite 
complicated.  Therefore  we  present  a  closed  form  solution.  Each  agent  is  considered  a  draw. 
Without  any  restrictions  there  are  ( ^ )  ways  for  a  DMA  to  pick  n  nodes  out  of  the  total  of  M 
nodes.  Since  there  are  2  draws  in  Ex.  1  and  Ex.  2,  let  us  start  with  I\  =  2  .  The  total  number 
of  draws,  without  restriction ,  are  (^)  ,  which  is  the  number  of  elements  in  the  sample  space. 
Now  let  us  consider  the  event  under  question  —  this  is  where  the  combined  number  of  distinct 
nodes  picked  by  both  agents  is  T.  A\  has  no  restriction  so  there  are  (^)  ways  for  A\  to  pick  n 
nodes.  Now  A2  has  to  pick  nodes  so  that  there  are  exactly  T  distinct  nodes  between  the  two 
nodes.  Since  A\  has  chosen  n  distinct  nodes  M  —  n  nodes  are  left  unchosen.  Thus,  A2  must 
pick  T  —  n  nodes  from  the  M  —  n.  A2  still  has  n  —  (T  —  n)  =  2 n  —  T  nodes  to  pick  from  the  n 
that  Ai  has  chosen.  Therefore  there  are  (^f“”)  "  T)  ways  for  A2  to  choose.  Combining  this 

with  the  (^)  ways  for  A\  to  pick,  we  see  that  P2(M,n  :  T )  =  ^ n  ^  T~/y  2n~T^  =  ■ 

Of  course  for  things  to  make  sense  we  must  have  that  n  <T<  min(M,  2n).  Therefore  we 
have  that 


P2{M,n  :  T)  = 


(")'  (M  (2n"-T)  »<T<min(M,2«) 


0  otherwise. 

To  simplify  terminology  we  use  the  extended  definition  of  the  binomial  coefficient  (/)  as: 


So  we  see  that 


a! 

(a-b)'.b'. 


a  >b>  0,  a  and  b  are  integers 


0  otherwise. 


P2(M,  n  :  T) 


( M  —  n\  (  n  \ 
\T-n  )  \2n-Tj 


(1) 


Now  what  happens  if  we  have  3  agents?  As  before  the  size  of  the  sample  space  is  (^)3t 
which  is  the  total  number  of  ways  that  3  agents  may  pick  n  nodes  each.  A\  is  unconstrained 
so  it  has  (^)  ways  to  pick  n  nodes.  The  second  agent  A2  has  n2  =  0,1,..., n  nodes  in 
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common  with  A\ .  Therefore  n  —  no  nodes  picked  by  Ao  are  in  the  remaining  M  —  n  nodes 
left  after  Ai  picked.  Therefore,  for  no  fixed,  there  are  (?”  )  (n  J^")  ways  for  Ao  to  choose  nodes. 
Of  course  we  must  sum  over  all  the  different  values  that  no  may  achieve.  So  all  together 
there  are  (,"  )  in-no)  ■  Bor  the  third  agent  A3 ,  n  +  (n  —  no)  distinct  nodes  have  already 

been  picked  by  A\  and  Ao  from  the  M  nodes.  Therefore  T  —  { n  +  ( n  —  no))  nodes  are  picked 
from  the  remaining  M  —  (n  +  (n  —  no)),  which  accounts  for  a  factor  of  Zo ^+"2)  •  But  there 
are  the  n  —  (T  —  n  —  { n  —  no))  nodes  that  A3  shares  with  the  picks  of  A\  and  Ao .  This  results 
in  a  factor  of  (3jj”y"2nJ.  Putting  all  three  agents  together  and  dividing  by  the  number  of 


2n— ri2 
!  n—T—r 

elements  in  the  sample  space  results  in 


P3(M,n:T)  = 


M 


-2  n  ,  , 

£ 

n2= 0  v 


M  —  'll 

'll  —  'll  2 


2  n  —  no 

3  n  —  T  —  n o 


M  —  2  n  +  no 
T  —  2  n  +  no 


)}• 


(2) 


Of  course  this  will  only  result  in  non-zero  values  for  n  <  T  <  min(M,  3 n). 
Similarly  for  4  agents  we  can  derive  the  following  formula  for  /'.( M.  ii  :T). 


Pi  (M,  n  :  T)  = 

—3  n 


?J( 


n  \  ( M  —  n\  ( 2 n  —  iio\  PM  —  2 n  +  no 
no)  l  ii  -  no)  l  n3  )  \  n  -  n3 


^2,n3=0 

3 n  —  no  —  n3  4  / M  —  3 n  +  no  4-  n3 
4 n  —  T  —  no  —  n3)  \  T  —  3 n  +  no  +  n3 


)}■ 


In  general,  for  I<  picks  of  n  distinct  things  from  a  total  out  of  M  the  probability  of  picking  T 
unique  items  is: 


PK(M,  n  :  T)  = 
'A/A  -{K~1] 
n 


E 

,,nK-i=0 


{( 


n  \  ( M  —  n\  (  2 n  —  iio\  /  M  —  2 n  +  no ' 

no)  \n  -  no)  V  n3  )  \  n  -  n3 

I<  -  2 )n  —  no  -  ...  -  hk-o\  ( M  -  (I<  -  2 )n  +  n2  +  . . .  +  hr-o'' 
iiK-i  )  V  n-  iiK- i 

\I<  -  1  )n  -no-...-  nK-i\  f  M  -  {K  -  1  )n  +  no  +  . . .  +  nK-i\  1  K>  ^ 

Kn  -  T  -  no  -  ...  -  iik-1  )  \T  —  (K  —  1  )n  +  n2  +  . . .  +  iik- i  )  j  ’  _ 

Thus  Eqs.  (1),  (2),  and  (3)  give  us  Px(M,n  :  T)  for  all  K  >  1.  As  discussed  before, 
concentrating  solely  upon  the  probability  Pk(M,  ii  :  T)  is  not  sufficient.  Pk(M,  ii  :  T)  is 
the  probability  of  getting  exactly  T  unique  atoms  of  information.  If  the  information  that 
the  agents  are  attempting  to  retrieve  is  revealed  when  T  =  C,  then  the  correct  probabilistic 
term  of  interest  (as  previously  discussed  with  respect  to  the  threshold)  is  defined  as: 

def 


PK(M,n  :  C+)  =  £  PK(M,n  :  T)  . 

T>C 

This  is  the  probability  of  K  DMA  obtaining  at  least  C  unique  atoms. 

Let  us  consider  Pk(M,  ii  :  T)  and  its  limiting  behavior  for  some  small  values  of  M,  n, 
and  K.  The  only  non-zero  probabilities  are  Px{M,n  :  T),  for  n  <T  <  min(M,  K  ■  n).  Now 
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let  us  consider  how  Pk{M,ti  :  T )  behaves  as  M  — )•  oo.  This  is  the  situation  when  the  agents 
are  searching  over  a  large  network. 


Figure  1:  Limiting  behavior,  as  M  grows,  of  P3(M,2  :  5)  and  P3(M,  2  :  6) 

Let  M  be  very  large  with  respect  to  Kn.  The  larger  M  is  the  smaller  the  chance  of 
intersection  between  nodes  picked  by  different  agents.  In  Figure  1  we  see  a  plot  of  the 
probability  P3(M,2  :  6)  (approaches  1)  and  P3(M,2  :  5)  (approaches  0)  against  M.  Of 
course  Figure  1  is  only  dealing  with  a  very  few  picks  of  a  small  number  of  nodes.  The  total 
number  of  nodes  M  must  be  several  orders  of  magnitude  larger  that  Kn  before  the  limiting 
behavior  becomes  apparent.  We  will  return  to  limiting  behavior  in  the  next  subsection. 

2.2  Some  Simulation  Results 

In  this  subsection  we  study  the  behavior  of  P3q{M, 20  :  T).  Simulations  are  used  since  the 
time  to  run  the  closed  form  solution  is  on  the  order  of  nK ,  and  thus  closed  form  calculations 
are  only  feasible  for  very  small  values  of  the  various  terms.  Simulations  of  1000  were  sufficient 
for  Figure  2  (in  later  plots  we  use  much  larger  simulations).  Of  course  one  should  keep  in 
mind  that  theoretically  Pk(M,h  :  T )  is  never  0,  for  M  <T  <  Kn,  and  that  Pk ( M,  n  :  T)  is 
never  1,  for  M  <T  <  Kn.  The  simulations  might  have  a  value  of  0  or  1,  but  this  is  because 
in  reality  the  probability  is  either  extremely  small,  or  large,  respectively.  Thus  we  will  often 
say  that  a  probability  is  “essentially”  0  or  “essentially”  1.  In  Figure  2  we  see  what  happens 
when  I\  =  30  and  n  =  20.  Figure  2  shows  the  plots  of  P30(M,  20  :  381),  P3q{M, 20  :  599), 
and  P3q(M,  20  :  600)  for  M  =  600,1000,  10000,  105,  106,  107,108,109.  For  the  P30(M, 20  :  T) 
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case,  M  must  be  at  109  before  we  start  seriously  approaching  the  limiting  probabilities. 

With  respect  to  the  given  M  values  we  see  the  following: 

1.  When  T  =  381,  the  only  probability  P3q(M,  20  :  381)  that  is  not  essentially  0,  is  when 
M  =  600.  (We  used  T  =  381  because  it  is  a  generic  “intermediate”  value  for  M  when 
K  =  30  and  n  =  20.) 

2.  When  T  =  599  the  probability  P3q{M,  20  :  599)  is  essentially  0  for  M  =  600, 1000, 10000, 
then  the  probability  increases,  but  it  decreases  again  as  M  grows  very  large. 

3.  When  T  =  600,  which  is  Kn,  the  probability  P3q(M, 20  :  600)  is  essentially  0,  for 
M  =  600, 1000, 10000.  The  probability  then  increases  until  it  essentially  reaches  its 
limiting  value  of  1  around  M  =  109 


log  base  1 0  of  M 

Figure  2:  Simulated  1000  times  limiting  behavior,  as  M  grows,  of 

P30(M,  20  :  381),  P30(M,  20  :  599)  and  P30(M,  20  :  600) 

We  see  that  the  distribution  of  the  T  values  indexed  by  M,  Tjvf(T)  (index  over  M  and  let 
T  run  through  its  values  in  Pk{M,ti  :  T )  with  K.  n  fixed.)  behaves  like  Iku{T ),  which  is 
the  distribution  that  has  probability  1  when  T  =  Kn  and  is  zero  elsewhere,  as  M  grows.  To 
be  precise: 

Given  e  >  0  and  for  any  value  of  T,  there  exists  a  $  such  that  |Tm(T)  —  Iku(T)\  <  e,  for 

all  M  >  $. 


We  need  not  discuss  the  various  types  of  probabilistic  convergence  for  our  needs.  It  suffices 
that  T m(T)  behave  like  Iku{T)  for  large  m.  We  can  also  heuristically  state  this  as 


PK(oo,n 


T  =  Kn 
otherwise 


(4) 


The  limiting  behavior  of  :  T)  determines  the  limiting  behavior  of  Pk(M,u  :  C+ ) 

which  we  may  also  state  this  heuristically  as 


PK(oo,n 


C  <  Kn 
otherwise 


(5) 


Let  us  continue  to  use  Pzq(M,  20  :  T)  as  an  example.  Above  we  have  shown  that  for  M 
large  the  only  value  of  T  of  interest  is  the  limiting  value  of  600  =  30-20.  This  agrees  with 
our  intuition.  If  the  “universe”  of  the  network  is  essentially  infinite,  then  the  different  DMA 
do  not  have  to  be  concerned  with  visiting  the  same  nodes  —  probabilistically,  it  will  not 
happen.  Therefore,  T  =  Kn  is  the  only  non-zero  probability,  and  it  is  of  course  1.  Now  let 
us  look  at  M  values  near  the  minimum  limiting  value  of  M  =  20.  The  smallest  M  can  be, 
and  for  the  problem  to  still  make  sense,  is  that  M  is  bounded  from  below  by  n.  Of  course, 
when  M  =  n  the  probability  collapses  to 

M  =  n  is  the  smallest  that  M  can  be.  What  happens  when  M  is  small,  but  not  at  its 
minimum  value  of  n.  Here  we  have  30  DMA,  and  each  DMA  randomly  travels  through  a 
network  of  M  nodes,  and  each  DMA  selects  20  distinct  nodes  from  the  network,  and  then 
transfers  the  atoms  of  information  to  the  DFA. 
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Figure  3:  Plots  of  essentially  non-zero  values  of  (simulated  100000  times / M) 
of  P30(M,  20  :  T),  for  M  =  20,  21, .  . . ,  45.  Note  the  simulated  distributions  have  all 
their  mass  at  T  =  M. 


We  wish  to  investigate  how  Pk{M,u  :  T )  behaves  as  M  — >  n+ .  Figure  3  shows  the  results 
of  simulations,  run  100000  times  each,  of  P30(M,  20  :  T)  for  M  =  20,  21, . . .  ,45.  We  see  that, 
for  small  M,  we  have 


For  M“near  and  greater  than”  n,  Pk(M,  u 


essentially  1 
essentially  0 


T  =  M 
otherwise 


This  is  because  the  universe  is  so  small  when  M  is  small  that,  with  probability  very  close 
to  1,  all  of  the  nodes  are  chosen  by  the  30  DMA.  The  question  is  how  “near”  is  “near.” 
In  our  example,  the  above  property  holds  approximately  for  M  <  2n.  however,  it  does  not 
hold  much  beyond.  In  Figure  4  we  see  what  happens  as  M  increase  from  45  to  165  in  steps 
of  10.  For  M  =  55,  Pk{M,ti  :  T )  has  two  essentially  non-zero  values.  We  stay  with  two 
values  in  the  simulations  until  M  =  85.  As  M  increases  the  number  of  essentially  non-zero 
probabilities  increase,  and  by  hooking  the  values  up  with  a  curve  they  start  to  slide  into  a 
bell  shape.  The  bell  shape  is  very  obvious  in  Figure  5,  where  we  are  investigating  M  in  the 
intermediate  range  of  200  to  1000,  in  increments  of  100.  As  M  increases  greatly,  as  shown  in 
Figure  6,  the  bell  shape  slowly  “hits  the  wall”  at  T  =  600  and  finally  we  have  the  limiting 
behavior  as  discussed  with  respect  to  Eq.  4.  From  this  analysis  we  see  that  Pk(M,  ii  :  T) 
behaves  like  a  uni-valued  distribution  for  M  small  (  Pk (small  M,n  :  T). 


For  Pk  (small  M,  n  : 


essentially  1 
essentially  0 


T  =  M 
otherwise 


and  it  is  essentially  uni-valued  for  M  large  as  given  by  Eq.  4.  In  the  intermediate  range  the 
graph  of  Pk(M,u  :  T )  slides  into  a  bell  shape  from  the  right  as  M  increases,  then  behaves 
like  as  a  bell  shape  (normal  distribution),  and  then  slides  into  a  uni- valued  distribution  from 
the  left  as  M  — »  oo. 
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T 

Figure  4:  Plots  of  essentially  non-zero  values  of  (simulated  100000  times / M) 

PsoiM,  20  :  T)  for  M  =  45,  55, . . .  165. 


Figure  5:  Plots  of  essentially  non-zero  values  of  (simulated  100000  times/M) 

of  Pzq(M,  20  :  T),  as  M  grows. 
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Figure  6:  Plots  of  essentially  non-zero  values  of  (simulated  100000  times/M) 

of  P30(M,  20  :  T)  for  M  =  104, 105, . . . ,  109. 


2.3  Cumulative  Distributions  from  the  Simulations 

Recall  that  the  actual  term  of  interest  is  :  C+).  We  could  of  course  just  sum  the 

results  from  the  simulations  for  the  T  values  that  are  greater  than  or  equal  to  C.  However 
we  wish  to  exploit  the  bell  shape  of  the  distribution  for  M  in  the  intermediate  range. 

We  do  not  know  why  the  distribution  has  a  bell  shape.  (We  hypothesize  that  it  is  related 
to  the  normal  approximation  to  the  binomial  distribution.)  We  are  presently  investigating  it 
and  we  hope  to  discuss  it  with  the  workshop  participants.  With  knowledge  of  the  mean  of 
T,  p,  and  variance  of  T,  a2  we  could  easily  compute  the  probability,  for  intermediate  M,  by 

^  p  oo  (a,_Ai)2 

Pk(M,  n  :C+)=Y'  Pk(M,  n  :  T)  ps  /  e  ^~dx  (6) 

T>c  v2it(j 2  Jc 

We  are  viewing  ,  with  M,n,K  fixed,  Pk{M,u  :  T)  as  a  random  variable  T.  Of  course  this 
approximation  introduces  error  by  approximating  a  discrete  mass  function  by  a  continuous 
density  function.  If  C  =  p.  then  we  have,  independent  of  the  value  of  the  variance  a2,  that 

Pk{M,ti  :  /r+)  t  /  e  dx  =1/2 

V27T(T2  J n 

Note  that  this  only  holds  in  the  intermediate  range  of  M  values,  which  is  a  relative  term 
with  respect  to  the  size  of  K  and  n.  We  cannot  determine  how  to  get  a  computable  term 
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for  the  mean  T  value  from  the  closed  form  Eqs.  (1),(2),  and  (3).  However,  we  are  able 
to  theoretically  determine  an  approximation  to  the  mean.  In  fact,  when  we  compare  it 
with  our  simulations  it  seems  to  be  better  than  an  approximation!  The  problem  is  that  the 
same  approach  does  not  work  for  the  variance.  In  our  problem  a  DMA  must  pick  n  unique 
nodes.  There  is  nothing  probabilistic  about  the  number  n,  it  is  a  hard  constraint.  However, 
if  we  pick  one  particular  node  and  ask  “What  is  the  probability  that  a  particular  DMA 
picked  this  node  (given  no  other  information)?”  one  would  answer  n/M-  In  our  problem 
knowledge  of  certain  nodes  being  picked  affects  the  conditional  probability.  For  example 
if  we  know  that  a  particular  DMA  did  not  pick  any  of  the  first  M  —  n  nodes  it  picks  node 
HM-n+i  with  probability  1.  In  other  words  we  cannot  assume  independence.  Now  we  perform 
our  approximation,  assuming  independence.  For  a  given  node,  we  sav  that  a  DMA  has  a 
probability  of  picking  that  node  equal  to  n/M,  and  all  of  the  nodes  are  independent.  (We 
see  that  on  the  average,  independence  does  not  matter.  We  note  though  that  the  variances 
derived  assuming  independence  are  larger  than  sample  simulation  variances.)  Therefore  the 
probability  that  a  node  is  not  picked  by  a  DMA  is  1  —  (n/M).  Hence,  the  probability  that 
no  DMA  picks  a  particular  node  is  (1  —  ( n/M))K .  So  the  probability  that  at  least  one  DMA 
picks  the  node  is  1  —  (1  —  (n/M))K .  Now  we  are  in  the  situation  of  a  binomial  random 
variable,  with  M  trials,  where  the  probability  of  a  success  is  1  —  (1  —  (n/M))K .  Therefore 
the  mean  is  M  ■  ^1  —  (1  —  (n/M))K^j.  Hence,  we  use  this  for  our  approximation  of  the  mean 
T  value,  we  call  the  approximation  F,  thus  F  ss  mean  of  T,  where 

F  =  M  ■  (l-(l-  ( n/M))K )  (7) 


Table  1:  mean  values 


distribution 

#  simulation 

sample  mean 

F 

^42(500,17:  T) 

104 

383 

383 

^42(1000, 17  :  T) 

“To1 

513 

513 

P3O(600,20:T) 

To5 

383 

383 

P3o(950,20:T) 

10s 

448 

448 

P3O(1000,20:T) 

10b 

455 

455 

P3O(104,20:T) 

105 

583 

583 

P3O(10b,20:T) 

10s 

598 

598 

P3O(10h,20:T) 

“[0s 

600 

600 

Based  on  Table  1,  and  other  data  we  have  obtained,  it  seems  that  the  approximation 
might  actually  be  an  equality,  but  we  cannot  prove  it.  There  are  slight  differences  between 
the  F  values  and  the  sample  means  derived  from  our  simulations.  Of  course,  simulation 
sample  means  are  only  approximations  themselves.  Unfortunately,  since  the  closed  form  for 
Pk(M,u  :  T)  is  so  computationally  expensive,  we  cannot  use  it  to  compare  F  to  the  actual 
mean  p.  of  Pk(M,u  :  T).  We  note  in  Table  1,  that  Eq.  7  agrees  with  the  limiting  value  of 
the  distributions,  as  M  grows,  and  the  distributions  collapse  to  a  single  non-trivial  value. 
This  is  because  (1  —  ( n/M))K  =  (1  —  (Kj/j/K))K .  Since  ex  =  limA'-^oo  (1  +  j|)A  we  have 
for  large  I\  that  (1  —  (Kjj/K))K  ss  e~^~ .  Therefore,  for  large  I\ ,  F  &  M  ■  ^1  —  e~^i~ 
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By  using  the  Taylor  series  for  ex  we  have  for  very  large  M  that  F  ss  Kn. 

The  usefulness  of  F  is  that  it  gives  us  a  way  of  determining  if  the  probability  associated 
with  a  given  threshold  is  more  or  less  than  50%.  Of  course,  if  we  find  a  way  of  approximating 
the  variance  we  could  use  any  probability,  not  just  1/2. 

In  Figure  7  we  see  the  plot  of  F  against  different  K  values  (only  the  integers  make  sense) 
for  M  =  1000  and  n  =  20.  If  the  threshold  value  is  above  (below)  F,  then  there  is  less 
(greater)  than  a  50%  chance  of  detecting  the  intrusion. 

Hence,  we  have  developed  a  useful  rule  of  thumb,  for  intermediate  M,  that  is  easily 
calculated  from  only  knowing  M,  n,  and  K.  Of  course,  one  should  keep  in  mind  that  M 
being  in  the  intermediate  range  is  relative  to  the  sizes  of  K  and  n.  For  very  large  M .  with 
moderate  n,  one  would  need  to  deploy  a  large  amount  of  DMA  to  use  the  cut-off  regions.  For 
non-intermediate  M  values  we  can  use  the  our  previous  limiting  results  to  handle  the  case 
where  M  is  either  very  small  or  very  large.  Thus  we  have  some  handle  on  the  probabilistic 
behavior  of  Pk(M,  m  :  C+). 


Plot  of  approx,  mean  F  for  M  =  1000,  n  =  20,  K  =  5  ..  35 


K 


Figure  7:  50%  cut-off  regions 

Let  us  go  through  a  specific  example  using  F,  Figure  7,  and  Table  1.  Consider  a  network 
of  size  M  =  1000,  K  =  30  DMA,  and  each  DMA  visits  n  =  20  nodes,  and  we  assume 
that  an  intrusion  is  detected  as  soon  as  the  DFA  has  0  =  400  atoms  of  information.  Since 
400  <  F  =  455,  the  probability  of  detecting  the  intrusion  is  greater  than  1/2.  If  we  use  a 
different  0  that  is  less  than  400,  the  probability  of  detecting  the  intrusion  is  even  greater. 
On  the  other  hand  if  0  =  500,  we  have  less  than  a  50%  chance  of  detecting  the  intrusion. 
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3  More  General  Scenarios 


We  have  seen  in  the  previous  section  that  even  for  the  simple  scenario  put  forward  we 
can  derive  a  closed  form  solution  for  the  probability,  but  it  is  not  computationally  feasible. 
Then  why  did  we  derive  it?  Intellectual  integrity  demands  that  we  attempt  to  solve  the 
problem.  We  do  not  have  the  tools  to  simplify  the  closed  form  but  we  are  working  on  it.  The 
terms  making  up  the  closed  form  are  special  functions  and  one  can  do  approximations  with 
them.  We  have  also  used  the  closed  form  to  verify  our  simulations  in  simple  cases.  Another 
important  reason  that  we  presented  the  closed  form  solution  was  to  show  that  if  the  solution 
is  so  computationally  complex,  even  in  the  simple  scenario  put  forward,  how  can  we  expect 
to  derive  and  use  a  closed  form  solution  in  more  complex  scenarios?  With  this  in  mind,  until 
we  can  approximate  the  special  functions  in  /'/x  (  M.  n  :  T),  we  suggest  only  simulations  for 
the  more  general  scenarios. 

3.1  Future  Work 

Previously  every  DMA  chose  the  same  number  of  nodes.  This  may  be  relaxed  and  the  number 
of  nodes  chosen  by  each  DMA  can  be  variable.  If  this  is  the  case  the  results  from  the  previous 
section  can  be  used  to  bound  the  probabilities  in  this  more  general  scenario. 

We  also  presented  a  scenario  where  all  the  DMA  report  to  the  DFA  at  a  set  time.  What 
if  the  times  are  variable,  this  certainly  will  affect  the  number  of  nodes  visited.  One  can 
also  look  at  the  probabilistic  terms  as  a  stochastic  process  where  the  results  change  in  time. 
Certainly,  if  this  is  the  case  and  the  DMA  are  traveling  around  the  network  the  limiting 
probabilities  would  eventual  collapse  because  enough  nodes  would  have  been  visited. 

It  is  not  necessary  that  every  atom  of  information  have  the  same  value.  Perhaps  some 
nodes  atoms  should  be  weighted  more  than  others?  Perhaps  interactions  between  different 
nodes  results  in  different  types  of  information. 


4  Conclusion 


We  have  presented  a  model  for  a  mobile-agent  based  IDS,  called  ABIDE.  Using  ABIDE  as 
a  framework  we  have  analyzed  a  probabilistic  scenario  for  determining  if  an  intrusion  alarm 
should  be  sounded.  We  have  presented  the  closed  form  solution  and  detailed  simulation 
results  for  a  simple  scenario.  A  rule  of  thumb  has  been  obtained  for  determining  certain 
probabilistic  regions  of  interest.  We  have  also  discussed  how  our  results  can  be  used  and 
extended  to  more  complex  scenarios. 
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